186 research outputs found
SAMNetWeb: identifying condition-specific networks linking signaling and transcription
Motivation: High-throughput datasets such as genetic screens, mRNA expression assays and global phospho-proteomic experiments are often difficult to interpret due to inherent noise in each experimental system. Computational tools have improved interpretation of these datasets by enabling the identification of biological processes and pathways that are most likely to explain the measured results. These tools are primarily designed to analyse data from a single experiment (e.g. drug treatment versus control), creating a need for computational algorithms that can handle heterogeneous datasets across multiple experimental conditions at once.
Summary: We introduce SAMNetWeb, a web-based tool that enables functional enrichment analysis and visualization of high-throughput datasets. SAMNetWeb can analyse two distinct data types (e.g. mRNA expression and global proteomics) simultaneously across multiple experimental systems to identify pathways activated in these experiments and then visualize the pathways in a single interaction network. Through the use of a multi-commodity flow based algorithm that requires each experiment āshareā underlying protein interactions, SAMNetWeb can identify distinct and common pathways across experiments.
Availability and implementation: SAMNetWeb is freely available at http://fraenkel.mit.edu/samnetweb.United States. National Institutes of Health (U54CA112967)United States. National Institutes of Health (R01GM089903)National Science Foundation (U.S.) (DB1-0821391
WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches
WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs.Whitaker FoundationMassachusetts Institute of Technology. Undergraduate Research Opportunities ProgramJohn S. Reed Fun
Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model
Cataloging the neuronal cell types that comprise circuitry of individual
brain regions is a major goal of modern neuroscience and the BRAIN initiative.
Single-cell RNA sequencing can now be used to measure the gene expression
profiles of individual neurons and to categorize neurons based on their gene
expression profiles. While the single-cell techniques are extremely powerful
and hold great promise, they are currently still labor intensive, have a high
cost per cell, and, most importantly, do not provide information on spatial
distribution of cell types in specific regions of the brain. We propose a
complementary approach that uses computational methods to infer the cell types
and their gene expression profiles through analysis of brain-wide single-cell
resolution in situ hybridization (ISH) imagery contained in the Allen Brain
Atlas (ABA). We measure the spatial distribution of neurons labeled in the ISH
image for each gene and model it as a spatial point process mixture, whose
mixture weights are given by the cell types which express that gene. By fitting
a point process mixture model jointly to the ISH images, we infer both the
spatial point process distribution for each cell type and their gene expression
profile. We validate our predictions of cell type-specific gene expression
profiles using single cell RNA sequencing data, recently published for the
mouse somatosensory cortex. Jointly with the gene expression profiles, cell
features such as cell size, orientation, intensity and local density level are
inferred per cell type
SteinerNet: a web server for integrating āomicā data to discover hidden components of response pathways
High-throughput technologies including transcriptional profiling, proteomics and reverse genetics screens provide detailed molecular descriptions of cellular responses to perturbations. However, it is difficult to integrate these diverse data to reconstruct biologically meaningful signaling networks. Previously, we have established a framework for integrating transcriptional, proteomic and interactome data by searching for the solution to the prize-collecting Steiner tree problem. Here, we present a web server, SteinerNet, to make this method available in a user-friendly format for a broad range of users with data from any species. At a minimum, a user only needs to provide a set of experimentally detected proteins and/or genes and the server will search for connections among these data from the provided interactomes for yeast, human, mouse, Drosophila melanogaster and Caenorhabditis elegans. More advanced users can upload their own interactome data as well. The server provides interactive visualization of the resulting optimal network and downloadable files detailing the analysis and results. We believe that SteinerNet will be useful for researchers who would like to integrate their high-throughput data for a specific condition or cellular response and to find biologically meaningful pathways. SteinerNet is accessible at http://fraenkel.mit.edu/steinernet.National Institutes of Health (U.S.) (U54-CA112967)National Institutes of Health (U.S.) (R01-GM089903)National Science Foundation (Award Number DB1-0821391)National Institutes of Health (U.S.) (U54-CA112967
Unsupervised learning of transcriptional regulatory networks via latent tree graphical models
Gene expression is a readily-observed quantification of transcriptional activity and cellular state that enables the recovery of the relationships between regulators and their target genes. Reconstructing transcriptional regulatory networks from gene expression data is a problem that has attracted much attention, but previous work often makes the simplifying (but unrealistic) assumption that regulator activity is represented by mRNA levels. We use a latent tree graphical model to analyze gene expression without relying on transcription factor expression as a proxy for regulator activity. The latent tree model is a type of Markov random field that includes both observed gene variables and latent (hidden) variables, which factorize on a Markov tree. Through efficient unsupervised learning approaches, we determine which groups of genes are co-regulated by hidden regulators and the activity levels of those regulators. Post-processing annotates many of these discovered latent variables as specific transcription factors or groups of transcription factors. Other latent variables do not necessarily represent physical regulators but instead reveal hidden structure in the gene expression such as shared biological function. We apply the latent tree graphical model to a yeast stress response dataset. In addition to novel predictions, such as condition-specific binding of the transcription factor Msn4, our model recovers many known aspects of the yeast regulatory network. These include groups of co-regulated genes, condition-specific regulator activity, and combinatorial regulation among transcription factors. The latent tree graphical model is a general approach for analyzing gene expression data that requires no prior knowledge of which possible regulators exist, regulator activity, or where transcription factors physically bind
SAMNet: a network-based approach to integrate multi-dimensional high throughput datasets
The rapid development of high throughput biotechnologies has led to an onslaught of data describing genetic perturbations and changes in mRNA and protein levels in the cell. Because each assay provides a one-dimensional snapshot of active signaling pathways, it has become desirable to perform multiple assays (e.g. mRNA expression and phospho-proteomics) to measure a single condition. However, as experiments expand to accommodate various cellular conditions, proper analysis and interpretation of these data have become more challenging. Here we introduce a novel approach called SAMNet, for Simultaneous Analysis of Multiple Networks, that is able to interpret diverse assays over multiple perturbations. The algorithm uses a constrained optimization approach to integrate mRNA expression data with upstream genes, selecting edges in the proteināprotein interaction network that best explain the changes across all perturbations. The result is a putative set of protein interactions that succinctly summarizes the results from all experiments, highlighting the network elements unique to each perturbation. We evaluated SAMNet in both yeast and human datasets. The yeast dataset measured the cellular response to seven different transition metals, and the human dataset measured cellular changes in four different lung cancer models of Epithelial-Mesenchymal Transition (EMT), a crucial process in tumor metastasis. SAMNet was able to identify canonical yeast metal-processing genes unique to each commodity in the yeast dataset, as well as human genes such as Ī²-catenin and TCF7L2/TCF4 that are required for EMT signaling but escaped detection in the mRNA and phospho-proteomic data. Moreover, SAMNet also highlighted drugs likely to modulate EMT, identifying a series of less canonical genes known to be affected by the BCR-ABL inhibitor imatinib (Gleevec), suggesting a possible influence of this drug on EMT.National Institutes of Health (U.S.) (Grant U54CA112967)National Institutes of Health (U.S.) (Grant R01GN089903)National Science Foundation (U.S.) (Award DB1-0821391)Massachusetts Institute of Technology. Undergraduate Research Opportunities Progra
Remote Inference of Cognitive Scores in ALS Patients Using a Picture Description
Amyotrophic lateral sclerosis is a fatal disease that not only affects
movement, speech, and breath but also cognition. Recent studies have focused on
the use of language analysis techniques to detect ALS and infer scales for
monitoring functional progression. In this paper, we focused on another
important aspect, cognitive impairment, which affects 35-50% of the ALS
population. In an effort to reach the ALS population, which frequently exhibits
mobility limitations, we implemented the digital version of the Edinburgh
Cognitive and Behavioral ALS Screen (ECAS) test for the first time. This test
which is designed to measure cognitive impairment was remotely performed by 56
participants from the EverythingALS Speech Study. As part of the study,
participants (ALS and non-ALS) were asked to describe weekly one picture from a
pool of many pictures with complex scenes displayed on their computer at home.
We analyze the descriptions performed within +/- 60 days from the day the ECAS
test was administered and extract different types of linguistic and acoustic
features. We input those features into linear regression models to infer 5 ECAS
sub-scores and the total score. Speech samples from the picture description are
reliable enough to predict the ECAS subs-scores, achieving statistically
significant Spearman correlation values between 0.32 and 0.51 for the model's
performance using 10-fold cross-validation.Comment: conference pape
Adaptive Bias Correction for Improved Subseasonal Forecasting
Subseasonal forecasting \unicode{x2013} predicting temperature and
precipitation 2 to 6 weeks \unicode{x2013} ahead is critical for effective
water allocation, wildfire management, and drought and flood mitigation. Recent
international research efforts have advanced the subseasonal capabilities of
operational dynamical models, yet temperature and precipitation prediction
skills remains poor, partly due to stubborn errors in representing atmospheric
dynamics and physics inside dynamical models. To counter these errors, we
introduce an adaptive bias correction (ABC) method that combines
state-of-the-art dynamical forecasts with observations using machine learning.
When applied to the leading subseasonal model from the European Centre for
Medium-Range Weather Forecasts (ECMWF), ABC improves temperature forecasting
skill by 60-90% and precipitation forecasting skill by 40-69% in the contiguous
U.S. We couple these performance improvements with a practical workflow, based
on Cohort Shapley, for explaining ABC skill gains and identifying higher-skill
windows of opportunity based on specific climate conditions.Comment: 16 pages of main paper and 2 pages of appendix tex
- ā¦